{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 内置图分析算法\n",
    "\n",
    "图分析在真实世界中有广泛的用处。许多算法比如社区发现，路径连通性，中心性算法都被证明在许多商业中非常有用。\n",
    "\n",
    "GraphScope 内置了一系列算法，使得用户可以方便的在图数据上做分析。\n",
    "\n",
    "这个教程展示了如何使用内置算法来完成图分析任务。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install graphscope package if you are NOT in the Playground\n",
    "\n",
    "!pip3 install graphscope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 准备工作\n",
    "\n",
    "首先，我们需要载入一张属性图。\n",
    "\n",
    "这里我们从 [Gnutella peer-to-peer network, August 31 2002](http://snap.stanford.edu/data/p2p-Gnutella31.html)\n",
    "使用 peer-to-peer network 数据集，并在此基础上为点和边加入了属性。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the graphscope module.\n",
    "\n",
    "import graphscope\n",
    "\n",
    "graphscope.set_option(show_log=False)  # enable logging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load p2p network dataset\n",
    "\n",
    "from graphscope.dataset import load_p2p_network\n",
    "\n",
    "graph = load_p2p_network(directed=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "来看一下图的数据模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(graph.schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如上所示，我们载入了一张属性图，点标签为 \"host\"，有两个属性 (\"weight\" 和 \"id\")，边标签为 \"connect\"， 有三个属性，分别为 (\"src_label_id\", \"dst_label_id\", \"dist\")。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 投影为简单图\n",
    "\n",
    "许多图分析算法都只能查询 **简单图**， 在这里我们定义 **简单图** 为只包含一种点和一种边，且点和边最多只有一个属性。\n",
    "\n",
    "`GraphScope` 提供了一个函数 `project` 来将属性图投影为简单图，我们可以选择某一种点和边，以及其一个或零个属性，来获得属性图的一个 **投影**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "simple_graph = graph.project(vertices={\"host\": []}, edges={\"connect\": [\"dist\"]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 运行算法\n",
    "\n",
    "在接下来的部分，我们将运行几个算法，并查看结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 单源最短路径\n",
    "\n",
    "单源最短路径算法（Single Source Shortest Path，接下来将以 `sssp` 代称），需要两个参数，第一个参数是 `graph`，第二个参数是查询的出发点 `src`。\n",
    "\n",
    "在这里，我们将查询从 ID 为 6 的点出发到所有点的最短路径长度。\n",
    "\n",
    "在后端，GraphScope 会生成一个兼容被查询的图的算法，编译为一个动态链接库。额外的 GraphScope 会对类型做一些检查，比如，在这里 `sssp` 算法要求边有一个 `int` 或 `double` 的属性作为距离。\n",
    "\n",
    "第一次编译动态链接库时需要一些时间，但是对于同一个算法，这一步在同样类型的图上只会进行一次。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from graphscope import sssp\n",
    "\n",
    "sssp_context = sssp(simple_graph, src=6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "完成计算后，计算结果分布式的存储到了集群上 `vineyard` 的实例上。\n",
    "\n",
    "算法会返回一个 `Context` 对象，包含若干种取回结果的方法。\n",
    "\n",
    "关于 `Context` 的更多信息，请参照 [Context](https://graphscope.io/docs/reference/context.html)\n",
    "\n",
    "在这里，结果表示从起始点出发到所有点的最短路径，我们使用返回的对象来取得一部分结果，以及取回点ID一并展示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sssp_context.to_dataframe(\n",
    "    selector={\"id\": \"v.id\", \"dist\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n",
    ").sort_values(by=\"id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外， 我们还可以将结果输出到客户端的机器的文件系统上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sssp_context.output_to_client(\"./sssp_result.csv\", selector={\"id\": \"v.id\", \"dist\": \"r\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看一下输出的文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!head ./sssp_result.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PageRank\n",
    "\n",
    "PageRank 是非常著名的一个图分析算法，让我们来看一下如何仅用两行计算 PageRank。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from graphscope import pagerank\n",
    "\n",
    "pr_context = pagerank(simple_graph, delta=0.85, max_round=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pr_context.to_dataframe(\n",
    "    selector={\"id\": \"v.id\", \"rank\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n",
    ").sort_values(by=\"id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "source": [
    "输出结果到本地。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pr_context.output_to_client(\n",
    "    \"./pagerank_result.csv\", selector={\"id\": \"v.id\", \"rank\": \"r\"}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 弱联通分量\n",
    "\n",
    "在图理论中，无向图中的一个分量，有时被称为一个联通分量，代表一个任意两个节点之间都有边的子图，且这个子图没有指向子图外的点的边。\n",
    "\n",
    "弱联通分量算法（Weakly Connected Components， WCC）计算每个点所属的联通分量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from graphscope import wcc\n",
    "\n",
    "wcc_context = wcc(simple_graph)\n",
    "wcc_context.to_dataframe(\n",
    "    selector={\"id\": \"v.id\", \"cc\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n",
    ").sort_values(by=\"id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请查阅 [Builtin algorithms](https://graphscope.io/docs/analytics_engine.html#built-in-algorithms) 来获得更多的内置算法的信息，并欢迎试用更多的算法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
